1 1. A method comprising: 

2 receiving a first program unit in a parallel 

3 computing environment having a team of parallel threads 

4 including at least a first and second thread, the first 

5 program unit including a memory copy operation to be 

6 performed between the first thread and the second thread; 

7 and 

3 translating the first program unit into a 
9 second program unit, the second program unit to associate 

^ff 10 the memory copy operation with a set of one or more 

jf 11 instructions, the set of instructions to ensure that the 

"f* 12 second thread copies data based, in part, on a first 

|ss5; 

J: 13 descriptor associated with the first thread. 

s 

ass;? 

M* 1 2. The method of claim 1 further comprising 

2 copying the address of the first descriptor to a buffer 

3 3 and copying data into a memory area associated with the 

4 second thread based, in part, on address and data 

5 information associated with the first descriptor. 

1 3. The method of claim 2 further comprising 

2 copying data into a memory area associated with second 

3 thread utilizing, in part, a second descriptor associated 

4 with the second thread. 

1 4. The method of claim 1 further comprising 

2 enabling the first thread to copy an address of the first 

3 descriptor to a buffer and setting a signal to enable the 

4 second thread to copy data associated with the first 
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descriptor to a memory area associated with the second 
thread. 
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1 5. The method of claim 4 further comprising 

2 enabling the first thread to enter a wait state after the 

3 signal is set. 

1 6. The method of claim 5 further comprising 

2 releasing the first thread from a wait state upon 

3 completion of the data copy operation by the second 

4 thread. 



y, 1 7. The method of claim 5 further comprising 

+ ; 2 enabling the first thread to copy an address of the first 

5' 

O 3 descriptor to one of two buffer areas. 



1 8. The method of claim 1 further comprising 

2 receiving the first program unit in source code format 

3 and translating the first program unit into a second 

4 program unit in source code format. 



1 9. A machine-readable medium that provides 

2 instructions, that when executed by a machine, enables 

3 the machine to perform operations comprising: 

4 receiving a first program unit in a parallel 

5 computing environment, the first program unit including a 

6 memory copy operation to be performed between a first 

7 thread in a team of threads and a second thread in the 

8 team of threads; and 
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9 translating the first program unit into a 

10 second program unit, the second program unit to associate 

11 the memory copy operation with a set of one or more 

12 instructions, the set of instructions to ensure that the 

13 second thread copies data based, in part, on a first 

14 descriptor associated with the first thread. 

1 10. The machine-readable medium of claim 9, further 

2 comprising copying the address of the first descriptor to 



5:f 3 a buffer and copying data into a memory area associated 

JJ 4 with the second thread based, in part, on address and 

5 5 data information associated with the first descriptor. 

IP 



1 11. The machine-readable medium of claim 10, 

2 further comprising copying data into a memory area 

3 associated with second thread based utilizing, in part, a 

4 second descriptor associated with the second thread. 

1 12. The machine-readable medium of claim 9, further 

2 comprising enabling the first thread to copy an address 

3 of the first descriptor to a buffer and setting a signal 

4 to enable the second thread to copy data associated with 

5 the first descriptor to a memory area associated with the 

6 second thread. 

1 13 The machine-readable medium of claim 12, 

2 further comprising enabling the first thread to enter a 

3 wait state after the signal is set. 

1 14. The machine-readable medium of claim 13, 

2 further comprising releasing the first thread from a wait 
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3 state upon completion of the data copy operation by the 

4 second thread. 

1 15. The machine-readable medium of claim 13, 

2 further comprising enabling the first thread to copy an 

3 address of the first descriptor to one of two buffer 

4 areas. 

Mi 1 16. The machine-readable medium of claim 12, 

J!? 2 further comprising copying data into a memory area 

3 associated with second thread utilizing, in part, a 

2r 4 second descriptor associated with the second thread. 
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1 17. The machine-readable medium of claim 9 further 

2 comprising receiving the first program unit in source 

3 code format and translating the first program unit into 

4 the second program unit in source code format. 

1 18. A method comprising: 

2 receiving a first program unit in a parallel 

3 computing environment and translating the first program 

4 unit, in part, into one or more computer instructions, 

5 the instructions enabling a second thread in a team of 

6 threads to copy data, into a memory area associated with 

7 the second thread, from a private memory area associated 

8 with a first thread; and 

9 copying the address of a descriptor into a buffer 

10 utilized by the second thread, in part, to copy data 

11 from the memory area associated with the first thread. 
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1 19. The method of claim 18, further comprising 

2 creating a descriptor utilized, in part, by the second 

3 thread to copy data into the memory area associated with 

4 the second thread. 

1 20. The method of claim 19, further comprising 

2 setting a signal by the first thread enabling the second 

3 thread to copy the data from the memory area associated 
y : 4 with the first thread. 

fi 

3 1 21. The method of claim 20, further comprising 

pa 

2: 2 entering a wait state by the first thread until the 

jt& 3 second thread copies the data from the memory area 

SB? 

^ 4 associated with the first thread. 
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1 22. An apparatus comprising: 

y, 2 a memory including a shared memory location; 

fi 3 and 

ru 

4 a translation unit coupled with the memory, the 

5 translation unit operative to associate a first program 

6 unit, including a memory copy operation to be performed 

7 between a first thread in a team of threads and a second 

8 thread in the team of threads, with a set of one or more 

9 instructions, the set of instructions to ensure that the 

10 second thread copies data based, in part, on a first 

11 descriptor associated with the first thread. 

1 23. The apparatus as in claim 22 wherein the 

2 address of the first descriptor is copied to a buffer by 

3 the first thread and the second thread copies data into a 

4 memory area associated with the second thread based, in 
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part, on address and data information associated with the 
first descriptor. 



1 24. The apparatus as in claim 23 wherein the second 

2 thread copies data into a memory area associated with the 

3 second thread utilizing, in part, a second descriptor 

4 associated with the second thread. 



1 25. The apparatus as in claim 22 wherein the first 

2 thread copies an address of the first descriptor to a 

3 buffer and sets a signal to enable the second thread to 

4 copy data associated with the first descriptor to a 

5 memory area associated with the second thread. 

1 26. The apparatus as in claim 25 wherein the first 

2 thread enters a wait state after the signal is set. 

1 27. The apparatus of claim 26, wherein the first 

2 thread exits the wait state after completion of the 

3 data copy by the second thread. 

1 28. The apparatus of claim 22 wherein the first 

2 program unit is in source code format. 

1 29. The apparatus of claim 28 wherein the first 

2 descriptor is passed to the first program unit. 

1 30. The apparatus as in claim 22 wherein the 

2 translation unit translates the first program unit, in 

3 part, into a second program unit in source code format 
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and the second program unit includes the memory copy 
operation . 
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